Mako hSDM BRT preliminary results

Author

Emily Nazario

Published

August 12, 2024

Study hypotheses and objectives

H1: The AGI model will perform better than the dissolved oxygen and null model, and the dissolved oxygen model will perform better than the null model.

study objective being met: Which model performs the best and presents the best predictions (i.e., best predictive performance scores, most ecologically realistic suitability maps)?

H2: The inclusion of dissolved oxygen at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the dissolved oxygen model considering surface values alone.

study objective being met: How does dissolved oxygen at different depths influence habitat suitability predictions relative to oxygen at the surface?

H3: The inclusion of the AGI at deeper depths will result in better/more ecologically realistic habitat suitability predictions relative to the AGI model considering surface values alone.

study objective being met: How does the aerobic growth index (AGI; environmental oxygen supply:theoretical oxygen demand) at different depths influence habitat suitability predictions relative to the aerobic growth index at the surface?

H4: There will be important relationships between dissolved oxygen/the AGI and latitude/distance to coast.

study objective being met: Are there any important relationships between dissolved oxygen or AGI at the surface or at depth and latitude or distance to the coast?

H5: The null model will predict higher habitat suitability in areas or during seasons or periods (upwelling or La Niña) with lower dissolved oxygen through the water column relative to the dissolved oxygen and AGI models.

study objective being met: How do the habitat suitability maps differ between the models? How do these variations compare for different points in time?

Model summaries

explore_brt(mod_file_path = brt_outputs_crw[7], 
            test_data = base_test_daily_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.7986447
Correlation        0.7174286
AUC                0.9148000
Per.Expl          42.3894599
cvDeviance         1.0127285
cvCorrelation      0.5642493
cvAUC              0.8220800
cvPer.Expl        26.9464423
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 27.367481
temp_mean  18.477169
sal_mean   10.540214
chl_mean    8.710175
ssh_mean    6.015610
mld_mean    5.861958
vostr_mean  5.303627
bathy_sd    5.270798
vo_mean     3.496298
uo_mean     3.392592
uostr_mean  2.849782
pred_var    2.714296
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   363.44
2          6    vo_mean          3   sal_mean   193.16
3         12   pred_var          4    uo_mean   168.67
4          8   ssh_mean          2  temp_mean   162.10
5          2  temp_mean          1   chl_mean   129.68
6         10 bathy_mean          8   ssh_mean   110.34
7          8   ssh_mean          1   chl_mean    98.98
[1] "External percent deviance explained"
[1] 0.3850404

[1] "TPR"
[1] 0.6952086
[1] "TSS"
[1] 0.6133057
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4150 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3706824 0.6788681 0.8917619  1.003817         0.3850404 0.4238946
explore_brt(mod_file_path = brt_outputs_back[7], 
            test_data = base_test_daily_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862741
Residual.Deviance  0.2948092
Correlation        0.9249630
AUC                0.9922000
Per.Expl          78.7336988
cvDeviance         0.5909966
cvCorrelation      0.8025147
cvAUC              0.9464300
cvPer.Expl        57.3679835
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 37.726838
temp_mean  23.806676
sal_mean    7.021355
chl_mean    5.980357
ssh_mean    5.413233
uostr_mean  5.244057
vostr_mean  3.838429
bathy_sd    2.871536
mld_mean    2.581925
uo_mean     2.392186
vo_mean     1.868975
pred_var    1.254433
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1         10 bathy_mean          2  temp_mean   835.60
2         10 bathy_mean          8   ssh_mean   650.18
3          8   ssh_mean          2  temp_mean   556.11
4         10 bathy_mean          3   sal_mean   496.83
5         10 bathy_mean          4    uo_mean   406.37
6          3   sal_mean          2  temp_mean   343.56
7          8   ssh_mean          1   chl_mean   337.30
[1] "External percent deviance explained"
[1] 0.7242147

[1] "TPR"
[1] 0.7393556
[1] "TSS"
[1] 0.8700281
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4250 iterations were performed.
There were 12 predictors of which 12 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained PseudoR2
1 0.2312832 0.8884673 0.9794331 0.9980034         0.7242147 0.787337
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = agi_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862598
Residual.Deviance  0.1784155
Correlation        0.9620158
AUC                0.9980000
Per.Expl          87.1297181
cvDeviance         0.4443144
cvCorrelation      0.8616943
cvAUC              0.9680100
cvPer.Expl        67.9486949
[1] "Relative influence of predictor variables"

                rel.inf
bathy_mean    29.410142
temp_mean     22.090687
AGI_250m_seas  8.885923
AGI_0m         7.036726
AGI_0m_seas    4.031791
sal_mean       3.863634
AGI_250m_ann   3.717057
AGI_250m       3.185755
AGI_60m_ann    3.018782
ssh_mean       2.964206
chl_mean       2.686814
AGI_60m_seas   2.575530
AGI_0m_ann     1.639762
AGI_60m        1.581991
bathy_sd       1.553865
mld_mean       1.073780
pred_var       0.683555
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  3465.55
2          15    AGI_0m_ann         12 AGI_0m_seas   386.42
3           6    bathy_mean          3    sal_mean   359.44
4          16   AGI_60m_ann          6  bathy_mean   297.40
5          16   AGI_60m_ann         12 AGI_0m_seas   290.80
6          17  AGI_250m_ann          3    sal_mean   258.66
7          14 AGI_250m_seas          6  bathy_mean   192.39
8          14 AGI_250m_seas          2   temp_mean   163.51
9           8        AGI_0m          4    ssh_mean   162.57
10          6    bathy_mean          2   temp_mean   154.13
11         14 AGI_250m_seas         10    AGI_250m   153.38
12          3      sal_mean          2   temp_mean   150.76
13         12   AGI_0m_seas          2   temp_mean   123.54
14          8        AGI_0m          3    sal_mean   103.44
[1] "External percent deviance explained"
[1] -2.981096

[1] "TPR"
[1] 0.3431204
[1] "TSS"
[1] 0
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9050 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE        Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.8144573 -0.5895665 0.1853505 0.6929731         -2.981096 0.8712972
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = do_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.5891149
Correlation        0.8163435
AUC                0.9606000
Per.Expl          57.5041190
cvDeviance         0.8440735
cvCorrelation      0.6692730
cvAUC              0.8827300
cvPer.Expl        39.1126493
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        17.287654
o2_mean_250m_ann  14.631246
o2_mean_0m_seas    8.910687
o2_mean_250m_seas  7.142539
temp_mean          6.627435
o2_mean_60m_ann    5.495030
o2_mean_60m_seas   5.325783
bathy_mean         4.964121
sal_mean           4.592187
o2_mean_250m       4.465235
chl_mean           4.270698
o2_mean_0m_ann     3.305816
o2_mean_60m        3.042965
ssh_mean           2.916029
mld_mean           2.724895
bathy_sd           2.244270
pred_var           2.053409
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   508.25
2          12   o2_mean_0m_seas          1   o2_mean_0m   301.24
3          14 o2_mean_250m_seas         10 o2_mean_250m   166.76
4          12   o2_mean_0m_seas          5     ssh_mean   150.68
5          12   o2_mean_0m_seas          2     chl_mean   122.58
6           4          sal_mean          3    temp_mean   101.22
7          15    o2_mean_0m_ann          4     sal_mean    89.73
8          16   o2_mean_60m_ann          7   bathy_mean    86.63
9           9       o2_mean_60m          7   bathy_mean    81.59
10          9       o2_mean_60m          8     bathy_sd    77.18
11         11          pred_var          7   bathy_mean    66.90
12          7        bathy_mean          3    temp_mean    65.75
13         12   o2_mean_0m_seas          4     sal_mean    63.82
14         13  o2_mean_60m_seas          4     sal_mean    63.74
[1] "External percent deviance explained"
[1] 0.5285492

[1] "TPR"
[1] 0.7203563
[1] "TSS"
[1] 0.7319403
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5650 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3166705 0.7799527 0.9421311  1.001803         0.5285492 0.5750412
explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = agi_test_daily_seasonal_annual_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862598
Residual.Deviance  0.1725199
Correlation        0.9639600
AUC                0.9983000
Per.Expl          87.5550071
cvDeviance         0.4451753
cvCorrelation      0.8607369
cvAUC              0.9680800
cvPer.Expl        67.8865900
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    29.6072816
temp_mean     21.4544712
AGI_250m_seas  9.0847918
AGI_0m         7.0179705
AGI_0m_seas    4.0172439
sal_mean       4.0041197
AGI_250m_ann   3.7156705
ssh_mean       3.5991848
chl_mean       2.8199131
AGI_60m_ann    2.7817580
AGI_250m       2.7006117
AGI_60m_seas   2.6378381
AGI_0m_ann     1.7398823
AGI_60m        1.5526407
bathy_sd       1.4789096
mld_mean       1.0854342
pred_var       0.7022784
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  3569.24
2          15    AGI_0m_ann         12 AGI_0m_seas   661.16
3           6    bathy_mean          3    sal_mean   530.25
4          16   AGI_60m_ann         12 AGI_0m_seas   279.96
5          16   AGI_60m_ann          6  bathy_mean   246.01
6          17  AGI_250m_ann          3    sal_mean   210.57
7           8        AGI_0m          4    ssh_mean   192.50
8          14 AGI_250m_seas          2   temp_mean   176.08
9           3      sal_mean          2   temp_mean   173.66
10          6    bathy_mean          2   temp_mean   166.93
11          8        AGI_0m          6  bathy_mean   141.44
12         12   AGI_0m_seas          2   temp_mean   131.43
13         14 AGI_250m_seas         10    AGI_250m   125.88
14          9       AGI_60m          6  bathy_mean   125.25
[1] "External percent deviance explained"
[1] 0.8155907

[1] "TPR"
[1] 0.745645
[1] "TSS"
[1] 0.9236973
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
9350 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1819719 0.9327983 0.9904085 0.9914034         0.8155907 0.8755501
explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = do_test_daily_seasonal_annual_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862244
Residual.Deviance  0.2107719
Correlation        0.9504410
AUC                0.9963000
Per.Expl          84.7952550
cvDeviance         0.4559794
cvCorrelation      0.8564559
cvAUC              0.9667500
cvPer.Expl        67.1063781
[1] "Relative influence of predictor variables"

                     rel.inf
bathy_mean        26.7211943
o2_mean_0m        20.7583985
o2_mean_0m_seas    9.2882523
o2_mean_250m_seas  8.6714061
o2_mean_250m_ann   5.7739633
o2_mean_60m_seas   4.8036871
o2_mean_0m_ann     3.1994781
o2_mean_60m_ann    3.1542770
chl_mean           2.9571143
temp_mean          2.8154229
ssh_mean           2.5370378
sal_mean           2.1503543
o2_mean_250m       2.1428240
o2_mean_60m        1.9018201
bathy_sd           1.2755329
mld_mean           1.1979952
pred_var           0.6512417
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index       var2.names int.size
1           3         temp_mean          1       o2_mean_0m   524.36
2          16   o2_mean_60m_ann          7       bathy_mean   314.25
3           9       o2_mean_60m          7       bathy_mean   225.85
4           2          chl_mean          1       o2_mean_0m   218.47
5          14 o2_mean_250m_seas         10     o2_mean_250m   204.52
6           7        bathy_mean          5         ssh_mean   184.82
7           4          sal_mean          3        temp_mean   163.93
8          14 o2_mean_250m_seas          1       o2_mean_0m   144.74
9          16   o2_mean_60m_ann         13 o2_mean_60m_seas   126.52
10          7        bathy_mean          4         sal_mean   120.32
11          7        bathy_mean          2         chl_mean   109.90
12         12   o2_mean_0m_seas          3        temp_mean    98.20
13         14 o2_mean_250m_seas          2         chl_mean    97.60
14          9       o2_mean_60m          3        temp_mean    89.49
[1] "External percent deviance explained"
[1] 0.7936232

[1] "TPR"
[1] 0.7443195
[1] "TSS"
[1] 0.9059035
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7500 iterations were performed.
There were 17 predictors of which 17 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.1963482 0.9210228 0.9883101 0.9952783         0.7936232 0.8479526
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_agi_0m_250m_dail_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6277548
Correlation        0.7973163
AUC                0.9522000
Per.Expl          54.7169289
cvDeviance         0.8617301
cvCorrelation      0.6582930
cvAUC              0.8768100
cvPer.Expl        37.8391330
[1] "Relative influence of predictor variables"

                rel.inf
temp_mean     15.504668
AGI_250m_ann  15.163685
AGI_0m        11.740021
bathy_mean    11.710734
AGI_0m_seas    8.197830
sal_mean       6.807123
AGI_250m_seas  6.751818
AGI_0m_ann     5.161877
chl_mean       5.090897
AGI_250m       4.624616
bathy_sd       3.784137
mld_mean       3.106183
pred_var       2.356411
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index    var1.names var2.index    var2.names int.size
1          7        AGI_0m          2     temp_mean  2843.30
2          7        AGI_0m          5    bathy_mean   355.65
3         12    AGI_0m_ann         10   AGI_0m_seas   197.02
4         10   AGI_0m_seas          8      AGI_250m   181.68
5         13  AGI_250m_ann         12    AGI_0m_ann   165.05
6         11 AGI_250m_seas          8      AGI_250m   149.59
7         10   AGI_0m_seas          2     temp_mean   136.02
8         13  AGI_250m_ann         11 AGI_250m_seas   122.24
[1] "External percent deviance explained"
[1] 0.4999456

[1] "TPR"
[1] 0.7155658
[1] "TSS"
[1] 0.7086853
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5450 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
      RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.328907 0.7586342 0.9325364  1.007354         0.4999456 0.5471693
explore_brt(mod_file_path = "data/brt/mod_outputs/crw/refined/brt_do_0m_250m_dail_seas_ann.rds",
            test_data = do_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6163731
Correlation        0.8037426
AUC                0.9552000
Per.Expl          55.5378474
cvDeviance         0.8577154
cvCorrelation      0.6617449
cvAUC              0.8786300
cvPer.Expl        38.1285877
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        17.645183
o2_mean_250m_ann  17.206276
o2_mean_0m_seas   10.345249
temp_mean          7.771990
o2_mean_250m_seas  7.199949
bathy_mean         7.171107
sal_mean           6.275883
chl_mean           5.065733
o2_mean_0m_ann     4.840965
o2_mean_250m       4.413205
ssh_mean           4.028459
mld_mean           3.069241
bathy_sd           2.700490
pred_var           2.266270
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   510.24
2          13    o2_mean_0m_ann          3    temp_mean   283.65
3          12 o2_mean_250m_seas          4     sal_mean   254.65
4          11   o2_mean_0m_seas          1   o2_mean_0m   212.12
5           4          sal_mean          3    temp_mean   189.20
6          11   o2_mean_0m_seas          5     ssh_mean   181.69
7           7        bathy_mean          3    temp_mean   158.82
8          13    o2_mean_0m_ann          4     sal_mean   151.35
9          12 o2_mean_250m_seas          9 o2_mean_250m   148.24
10          5          ssh_mean          4     sal_mean   132.70
[1] "External percent deviance explained"
[1] 0.5114889

[1] "TPR"
[1] 0.7177333
[1] "TSS"
[1] 0.7216508
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3235065 0.7685859 0.9368822  1.002484         0.5114889 0.5553785
explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_agi_0m_250m_dail_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862598
Residual.Deviance  0.2178641
Correlation        0.9487621
AUC                0.9963000
Per.Expl          84.2840333
cvDeviance         0.4666383
cvCorrelation      0.8522336
cvAUC              0.9651100
cvPer.Expl        66.3383249
[1] "Relative influence of predictor variables"

                 rel.inf
bathy_mean    31.0327439
temp_mean     21.5126289
AGI_250m_seas 10.5932242
AGI_0m         7.2883846
ssh_mean       5.4945038
sal_mean       4.4691331
AGI_0m_seas    4.2234033
AGI_250m_ann   3.4503058
chl_mean       3.1456337
AGI_250m       2.9512860
AGI_0m_ann     1.9547153
bathy_sd       1.8211027
mld_mean       1.2416308
pred_var       0.8213038
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  4018.99
2           6    bathy_mean          3    sal_mean   336.19
3          12 AGI_250m_seas          3    sal_mean   317.17
4           6    bathy_mean          2   temp_mean   283.99
5          12 AGI_250m_seas          9    AGI_250m   279.87
6          13    AGI_0m_ann         11 AGI_0m_seas   264.70
7          14  AGI_250m_ann          3    sal_mean   214.18
8           3      sal_mean          2   temp_mean   207.16
9          12 AGI_250m_seas          6  bathy_mean   201.20
10          8        AGI_0m          4    ssh_mean   190.66
[1] "External percent deviance explained"
[1] 0.7826156

[1] "TPR"
[1] 0.7437599
[1] "TSS"
[1] 0.899488
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
7850 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2020269 0.9161384 0.9870229  0.991535         0.7826156 0.8428403
explore_brt(mod_file_path = "data/brt/mod_outputs/background/refined/brt_do_0m_250m_dail_seas_ann.rds",
            test_data = do_test_daily_seasonal_annual_back)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862244
Residual.Deviance  0.2156559
Correlation        0.9488750
AUC                0.9962000
Per.Expl          84.4429321
cvDeviance         0.4735685
cvCorrelation      0.8497414
cvAUC              0.9644500
cvPer.Expl        65.8375303
[1] "Relative influence of predictor variables"

                     rel.inf
bathy_mean        28.1527411
o2_mean_0m        21.5616098
o2_mean_250m_seas 12.6044992
o2_mean_0m_seas    8.9700418
o2_mean_250m_ann   6.4551558
o2_mean_0m_ann     3.9034438
temp_mean          3.5445843
chl_mean           3.2837754
sal_mean           2.6484189
ssh_mean           2.6018368
o2_mean_250m       2.3095329
bathy_sd           1.6979623
mld_mean           1.4124722
pred_var           0.8539255
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   738.70
2          12 o2_mean_250m_seas          9 o2_mean_250m   485.27
3          12 o2_mean_250m_seas          1   o2_mean_0m   272.32
4           7        bathy_mean          4     sal_mean   267.15
5          11   o2_mean_0m_seas          5     ssh_mean   246.13
6           2          chl_mean          1   o2_mean_0m   214.51
7           7        bathy_mean          3    temp_mean   202.68
8           7        bathy_mean          5     ssh_mean   194.75
9          12 o2_mean_250m_seas          4     sal_mean   193.45
10         11   o2_mean_0m_seas          7   bathy_mean   154.83
[1] "External percent deviance explained"
[1] 0.7862912

[1] "TPR"
[1] 0.7440081
[1] "TSS"
[1] 0.8961138
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
8000 iterations were performed.
There were 14 predictors of which 14 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.2012221 0.9167496 0.9876593   0.99309         0.7862912 0.8444293

Summary table of model performance metrics

model percent_explained deviance_exp TPR_mean TSS AUC RMSE SpearmanCor PseudoR2
base_Nwind_crw 36.455 0.349 0.683 0.556 0.867 0.385 0.640 0.365
base_Nwind_back 78.734 0.724 0.739 0.870 0.979 0.231 0.888 0.787
agi_alldep_allres_crw 57.743 0.526 0.719 0.730 0.940 0.318 0.777 0.577
do_alldep_allres_crw 57.504 0.529 0.720 0.732 0.942 0.317 0.780 0.575
agi_alldep_allres_back 87.555 0.816 0.746 0.924 0.990 0.182 0.933 0.876
do_alldep_allres_back 84.795 0.794 0.744 0.906 0.988 0.196 0.921 0.848
agi_0m_250m_dail_seas_ann_crw 55.077 0.503 0.716 0.703 0.932 0.328 0.760 0.551
do_0m_250m_dail_seas_ann_crw 55.538 0.511 0.718 0.722 0.937 0.324 0.769 0.555
agi_0m_250m_dail_seas_ann_back 84.284 0.783 0.744 0.899 0.987 0.202 0.916 0.843
do_0m_250m_dail_seas_ann_back 84.443 0.786 0.744 0.896 0.988 0.201 0.917 0.844

Conclusions

  • Based on the above comparisons between models using PAs from CRW and background sampling approaches, I have decided to move forward using the CRW PA models. The background sampling models consistently indicated that they were overfit, and thus the CRW models are the preferred choice.

  • Given the similar values among performance metrics between the overfit models (those with all depth levels and temporal resolutions) and the models without data at 60m, we have selected the latter to be the final study models. We explored similar models without data at 60m and at a seasonal temporal resolution, but these models performed considerably worse. Additionally, DO and AGI at 0m and 250m at a seasonal resolution were often a predictor variable with a top 5 highest relative importance, thus, keeping data at a seasonal resolution seemed valuable.

Final models w/o random predictor variables

explore_brt(mod_file_path = "data/brt/mod_outputs/final_mods/brt_base_0m_dail_no_wind.rds",
            test_data = base_test_daily_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862823
Residual.Deviance  0.8160352
Correlation        0.7038050
AUC                0.9065000
Per.Expl          41.1349935
cvDeviance         0.9973663
cvCorrelation      0.5753416
cvAUC              0.8287700
cvPer.Expl        28.0546057
[1] "Relative influence of predictor variables"

             rel.inf
bathy_mean 30.831647
temp_mean  22.940454
sal_mean   12.777564
chl_mean   10.523532
bathy_sd    7.978901
ssh_mean    7.803239
mld_mean    7.144664
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index var1.names var2.index var2.names int.size
1          6 bathy_mean          2  temp_mean   730.22
2          4   ssh_mean          2  temp_mean   560.59
[1] "External percent deviance explained"
[1] 0.3774799

[1] "TPR"
[1] 0.6925418
[1] "TSS"
[1] 0.5960813
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
4300 iterations were performed.
There were 7 predictors of which 7 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3740063 0.6699421 0.8864212  1.002138         0.3774799 0.4113499
explore_brt(mod_file_path = "data/brt/mod_outputs/final_mods/brt_do_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = do_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6004545
Correlation        0.8099189
AUC                0.9575000
Per.Expl          56.6861326
cvDeviance         0.8423891
cvCorrelation      0.6697309
cvAUC              0.8832400
cvPer.Expl        39.2341516
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        17.027705
o2_mean_250m_ann  15.896634
o2_mean_0m_seas    9.384843
temp_mean          7.043452
o2_mean_250m_seas  6.239946
o2_mean_60m_ann    5.633626
bathy_mean         5.169538
o2_mean_60m_seas   5.037047
sal_mean           4.915712
o2_mean_250m       4.514210
chl_mean           4.181528
o2_mean_0m_ann     3.503057
o2_mean_60m        3.087269
ssh_mean           3.035695
mld_mean           2.872809
bathy_sd           2.456927
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index        var1.names var2.index   var2.names int.size
1           3         temp_mean          1   o2_mean_0m   421.52
2          11   o2_mean_0m_seas          1   o2_mean_0m   256.74
3          14    o2_mean_0m_ann          4     sal_mean   210.40
4          13 o2_mean_250m_seas         10 o2_mean_250m   190.15
5          11   o2_mean_0m_seas          5     ssh_mean   144.83
6           7        bathy_mean          3    temp_mean   138.68
7           5          ssh_mean          4     sal_mean   124.56
8           4          sal_mean          3    temp_mean   115.46
9          11   o2_mean_0m_seas          4     sal_mean   107.40
10         15   o2_mean_60m_ann          7   bathy_mean   105.21
11          9       o2_mean_60m          7   bathy_mean   105.03
12         12  o2_mean_60m_seas          9  o2_mean_60m    94.94
13          9       o2_mean_60m          8     bathy_sd    94.79
[1] "External percent deviance explained"
[1] 0.5236399

[1] "TPR"
[1] 0.7194688
[1] "TSS"
[1] 0.7276229
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5400 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3188455 0.7760116 0.9403423  1.001684         0.5236399 0.5668613
explore_brt(mod_file_path = "data/brt/mod_outputs/final_mods/brt_agi_0m_60m_250m_dail_seas_ann_no_wind.rds",
            test_data = agi_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.5858010
Correlation        0.8161710
AUC                0.9602000
Per.Expl          57.7432667
cvDeviance         0.8309178
cvCorrelation      0.6751996
cvAUC              0.8859900
cvPer.Expl        40.0617726
[1] "Relative influence of predictor variables"

                rel.inf
temp_mean     13.808659
AGI_250m_ann  13.248314
AGI_0m        11.562619
bathy_mean     8.434040
AGI_0m_seas    7.508202
sal_mean       5.742767
AGI_250m_seas  5.436860
AGI_60m_ann    4.943205
AGI_60m_seas   4.543656
AGI_0m_ann     4.409229
AGI_250m       4.252699
ssh_mean       4.180552
chl_mean       3.804112
bathy_sd       2.999297
mld_mean       2.623939
AGI_60m        2.501850
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
   var1.index    var1.names var2.index  var2.names int.size
1           8        AGI_0m          2   temp_mean  1760.25
2          15   AGI_60m_ann         11 AGI_0m_seas   315.74
3           8        AGI_0m          6  bathy_mean   235.87
4          11   AGI_0m_seas         10    AGI_250m   189.96
5          16  AGI_250m_ann         14  AGI_0m_ann   183.28
6           8        AGI_0m          4    ssh_mean   145.41
7           8        AGI_0m          3    sal_mean   141.05
8          13 AGI_250m_seas         10    AGI_250m   109.75
9          14    AGI_0m_ann         11 AGI_0m_seas   106.67
10          9       AGI_60m          6  bathy_mean    98.24
11         12  AGI_60m_seas          4    ssh_mean    96.58
12         14    AGI_0m_ann          7    bathy_sd    92.19
13         15   AGI_60m_ann          4    ssh_mean    73.03
[1] "External percent deviance explained"
[1] 0.5261534

[1] "TPR"
[1] 0.719393
[1] "TSS"
[1] 0.7302788
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5750 iterations were performed.
There were 16 predictors of which 16 had non-zero influence.
       RMSE     Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3183118 0.77655 0.940186  1.009582         0.5261534 0.5774327
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862871
Residual.Deviance  0.6216714
Correlation        0.7997202
AUC                0.9530000
Per.Expl          55.1556520
cvDeviance         0.8550097
cvCorrelation      0.6633890
cvAUC              0.8795600
cvPer.Expl        38.3237656
[1] "Relative influence of predictor variables"

                    rel.inf
o2_mean_0m        18.096412
o2_mean_250m_ann  17.371934
o2_mean_0m_seas   10.555409
o2_mean_250m_seas  8.392439
temp_mean          8.008430
sal_mean           6.610190
bathy_mean         6.609148
chl_mean           5.117774
o2_mean_0m_ann     4.922417
o2_mean_250m       4.500747
ssh_mean           3.912119
mld_mean           3.194189
bathy_sd           2.708790
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index        var1.names var2.index   var2.names int.size
1          3         temp_mean          1   o2_mean_0m   666.21
2         10   o2_mean_0m_seas          1   o2_mean_0m   368.62
3         11 o2_mean_250m_seas          4     sal_mean   296.11
4         12    o2_mean_0m_ann          3    temp_mean   248.07
5         11 o2_mean_250m_seas          9 o2_mean_250m   221.72
6          4          sal_mean          3    temp_mean   220.04
7         10   o2_mean_0m_seas          5     ssh_mean   188.99
8          7        bathy_mean          3    temp_mean   180.12
[1] "External percent deviance explained"
[1] 0.5113989

[1] "TPR"
[1] 0.7174051
[1] "TSS"
[1] 0.712829
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5300 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE       Cor   C-index PredRatio DevianceExplained  PseudoR2
1 0.3240668 0.7672517 0.9361937  1.003678         0.5113989 0.5515565
explore_brt(mod_file_path = "data/brt/mod_outputs/final_mods/brt_agi_0m_250m_dail_seas_ann.rds",
            test_data = agi_test_daily_seasonal_annual_crw)
[1] "Model performance metrics"
                     Model 1
Total.Deviance     1.3862903
Residual.Deviance  0.6227688
Correlation        0.7983866
AUC                0.9524000
Per.Expl          55.0765938
cvDeviance         0.8514835
cvCorrelation      0.6647368
cvAUC              0.8804000
cvPer.Expl        38.5782656
[1] "Relative influence of predictor variables"

                rel.inf
AGI_250m_ann  15.624829
temp_mean     14.313855
AGI_0m        12.578555
bathy_mean    10.022586
AGI_0m_seas    8.257366
sal_mean       7.104914
AGI_250m_seas  5.966422
ssh_mean       5.289787
AGI_0m_ann     4.800005
AGI_250m       4.606867
chl_mean       4.471600
bathy_sd       4.007598
mld_mean       2.955618
[1] "Partial plots"

[1] "Top most important pairwise interactions as identified by the model"
  var1.index    var1.names var2.index  var2.names int.size
1          8        AGI_0m          2   temp_mean  2100.42
2         10   AGI_0m_seas          9    AGI_250m   333.68
3          8        AGI_0m          6  bathy_mean   264.37
4         11 AGI_250m_seas          9    AGI_250m   150.53
5          8        AGI_0m          4    ssh_mean   145.22
6         12    AGI_0m_ann         10 AGI_0m_seas   128.88
7         13  AGI_250m_ann         12  AGI_0m_ann   127.32
8         11 AGI_250m_seas          3    sal_mean   106.97
[1] "External percent deviance explained"
[1] 0.5028644

[1] "TPR"
[1] 0.7155138
[1] "TSS"
[1] 0.7032217
[1] "Model evaluation using a 75/25 train/test data split"
gbm::gbm(formula = y.data ~ ., distribution = as.character(family), 
    data = x.data, weights = site.weights, var.monotone = var.monotone, 
    n.trees = target.trees, interaction.depth = tree.complexity, 
    shrinkage = learning.rate, bag.fraction = bag.fraction, verbose = FALSE)
A gradient boosted model with bernoulli loss function.
5350 iterations were performed.
There were 13 predictors of which 13 had non-zero influence.
       RMSE      Cor  C-index PredRatio DevianceExplained  PseudoR2
1 0.3280184 0.759673 0.932425  1.008251         0.5028644 0.5507659

HSI maps

List models

These dates run from January to December of each year, but I am happy to modify this if there are better ways to categorize/plot ENSO phases.

Neutral year (2013)

Strong La Niña year (-1.5 to -1.9 temperature anomaly; 2011)

Moderate/strong El Niño year (1.5 to 1.9 temperature anomaly; 2009)

HSI difference maps

Neutral year (2013)

Strong La Niña year (-1.5 to -1.9 temperature anomaly; 2011)

Moderate/strong El Niño year (1.5 to 1.9 temperature anomaly; 2009)

AGI maps

Predictor variable relative importance

Performance metric comparisons